Explainable global error weighted on feature importance: The xGEWFI metric to evaluate the error of data imputation and data augmentation

نویسندگان

چکیده

Evaluating data imputation and augmentation performance is a critical issue in science. In statistics, methods like Kolmogorov-Smirnov K-S test, Cramér-von Mises $$W^2$$ , Anderson-Darling $$A^2$$ Pearson’s $$\chi ^2$$ Watson’s $$U^2$$ exists for decades to compare the distribution of two datasets. context generation, typical evaluation metrics have same flaw: They calculate feature’s error global on generated without weighting with importance. most cases, importance features imbalanced, it can induce bias errors. This paper proposes novel metric named “Explainable Global Error Weighted Feature Importance” (xGEWFI). new tested whole preprocessing method that 1. Process outliers, 2. impute missing data, 3. augments data. At end process, xGEWFI calculated. The between original calculated using test (K-S test) each feature. Those results are multiplied by respective Random Forest (RF) algorithm. result expressed an explainable format, aiming ethical AI. provides more precise generation process than if only were used.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

the effects of error correction methods on pronunciation accuracy

هدف از انجام این تحقیق مشخص کردن موثرترین متد اصلاح خطا بر روی دقت آهنگ و تاکید تلفظ کلمه در زبان انگلیسی بود. این تحقیق با پیاده کردن چهار متد ارائه اصلاح خطا در چهار گروه، سه گروه آزمایشی و یک گروه تحت کنترل، انجام شد که گروه های فوق الذکر شامل دانشجویان سطح بالای متوسط کتاب اول passages بودند. گروه اول شامل 15، دوم 14، سوم 15 و آخرین 16 دانشجو بودند. دوره مربوطه به مدت 10 هفته ادامه یافت و د...

15 صفحه اول

the effect of explicit versus implicit error correction on writing of iranian intermediate efl learners

در این پایان نامه دو روش اصلاح اشتباهات نوشتاری زبان آموزان بزرگسال ایرانی در سطح متوسط مورد بررسی قرار می گیرد. در روش اول (explicit) اشتباهات بطور مستقیم و در روش دوم (implicit) اشتباهات بصورت غیر مستقیم اصلاح می شود. برای انجام این تحقیق از دو گروه 15 نفری استفاده شده است. به زبان آموزان در هر جلسه یک موضوع انشا داده شده است. این کار در 15 هفته (15 جلسه) تکرار شده است. مقایسه نتایج این آزمون...

a study on insurer solvency by panel data model: the case of iranian insurance market

the aim of this thesis is an approach for assessing insurer’s solvency for iranian insurance companies. we use of economic data with both time series and cross-sectional variation, thus by using the panel data model will survey the insurer solvency.

the clustering and classification data mining techniques in insurance fraud detection:the case of iranian car insurance

با توجه به گسترش روز افزون تقلب در حوزه بیمه به خصوص در بخش بیمه اتومبیل و تبعات منفی آن برای شرکت های بیمه، به کارگیری روش های مناسب و کارآمد به منظور شناسایی و کشف تقلب در این حوزه امری ضروری است. درک الگوی موجود در داده های مربوط به مطالبات گزارش شده گذشته می تواند در کشف واقعی یا غیرواقعی بودن ادعای خسارت، مفید باشد. یکی از متداول ترین و پرکاربردترین راه های کشف الگوی داده ها استفاده از ر...

data mining rules and classification methods in insurance: the case of collision insurance

assigning premium to the insurance contract in iran mostly has based on some old rules have been authorized by government, in such a situation predicting premium by analyzing database and it’s characteristics will be definitely such a big mistake. therefore the most beneficial information one can gathered from these data is the amount of loss happens during one contract to predicting insurance ...

15 صفحه اول

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Applied Intelligence

سال: 2023

ISSN: ['0924-669X', '1573-7497']

DOI: https://doi.org/10.1007/s10489-023-04661-x